Analysis of a Longitudinal Study on Aging Conducted in Ireland

Diana Ballesteros (deg2163), Christopher Crowe (clc2229), Tanvi Jain (tj2383)


Motivation:
This project aims to look at the effect of sociodemographic indicators on physical and mental health among a cohort of Irish residents aged 50+ years old.

Related work:
This project is a broad examination of the implications of aging on physical and mental health. The inspiration for this analysis was drawn from the group members research interests including depression, aging, and chronic diseases. The following link to a page from the American Psychological Association was used as a reference for the variables we chose to analyze. https://www.apa.org/helpcenter/aging-depression.aspx

Questions:

  • Is physical health associated with mental health?
  • Does loneliness change over time among widowed men?
  • What is the proportion of various ICD-10 diseases among this population?

Evolution of questions:
As we created the graphs we decided to also explore variations according to sex (male and female) and different marital statuses (widowed, married, divorced, cohabitating, separated, and single). For some graphs, we also explored the change over waves 1, 2, and 3.

Evolution of methods:
In our initial proposal we wanted to look at Quality of Life. However, the nature of this variable seemed ambiguous to us so instead we looked at variables such as self-rated mental health, self-rated physical health, and loneliness as proxies for quality of life. We also planned to use a hexagonal heat map to visualize the relationship between physical and mental health, but according to the distribution of our data this map was not visually informative. Instead we used a geom_count to display the concentrations of data within a scatterplot.

Loading the necessary libraries:

knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(plotly)
library(MASS)
library(googledrive)
library(httr)

## The MASS package also has a select function, so we need to specify that when we call the select function, we are referring to select from the dplyr package.
select = dplyr::select

Instructions for importing the data:
Download the data from Google Drive to your local R environment (this requires you to sign into your Google account) then reference the data from a local path. The R console may provide you with a link to authenticate your Google account. If so, copy and paste this link into your browser and log into your Google account. After logging in, Google may successfully link to R and you can continue working in R as usual. However, after loggin in, Google may provide you with a authentication code. If so, copy and paste this code into your R console when prompted and continue working in R as usual.

## Save the data within the shiny folder for use in shiny
drive_download(as_id("https://drive.google.com/open?id=12qBx-FbPIKAA_aQr5JoMR9ehvpmRrbWN"), path = "./shiny/34315-0001-Data.rda", overwrite = TRUE)
drive_download(as_id("https://drive.google.com/open?id=1BLislRpc8OTvtYEw8118WKKLkVuHhV6C"), path = "./shiny/37105-0001-Data.rda", overwrite = TRUE)
drive_download(as_id("https://drive.google.com/open?id=1gj53tdb5RHhX5NPDv2eyBqfgEBPAc42G"), path = "./shiny/37106-0001-Data.rda", overwrite = TRUE)
## Save the data outside the shiny folder for use elsewhere in the website
drive_download(as_id("https://drive.google.com/open?id=12qBx-FbPIKAA_aQr5JoMR9ehvpmRrbWN"), path = "./34315-0001-Data.rda", overwrite = TRUE)
drive_download(as_id("https://drive.google.com/open?id=1BLislRpc8OTvtYEw8118WKKLkVuHhV6C"), path = "./37105-0001-Data.rda", overwrite = TRUE)
drive_download(as_id("https://drive.google.com/open?id=1gj53tdb5RHhX5NPDv2eyBqfgEBPAc42G"), path = "./37106-0001-Data.rda", overwrite = TRUE)

load("./34315-0001-Data.rda")
load("./37105-0001-Data.rda")
load("./37106-0001-Data.rda")

Tidying the data:
The following code creates a number of functions to clean variables that are used in our analyses. It also tidies the data and prepares dataframes to be used for visualizations.

## Create a function to clean the wave variable
clean_wave = function(x) {
  
  x %>% 
    mutate(
      wave = case_when(
        str_detect(wave, "_1$") ~ 1,
        str_detect(wave, "_2$") ~ 2,
        str_detect(wave, "_3$") ~ 3
    )
  )
  
}

## Create a function to clean the self-rated physical health variable
clean_physical = function(x) {
  
  x %>% 
    mutate(
      physical = case_when(
        physical_value == "(1) Excellent" ~ "Excellent", 
        physical_value == "(2) Very good" ~ "Very Good",
        physical_value == "(3) Good" ~ "Good", 
        physical_value == "(4) Fair" ~ "Fair", 
        physical_value == "(5) Poor" ~ "Poor",
        physical_value == "(01) Excellent" ~ "Excellent", 
        physical_value == "(02) Very Good" ~ "Very Good",
        physical_value == "(03) Good" ~ "Good", 
        physical_value == "(04) Fair" ~ "Fair", 
        physical_value == "(05) Poor" ~ "Poor"),
      physical = factor(physical, levels = c("Poor", "Fair", "Good", "Very Good", "Excellent"))
    )
  
}

## Create a function to clean the self-rated mental health variable
clean_mental = function(x) {
  
  x %>% 
    mutate(
      mental = case_when(
        mental_value == "(1) Excellent" ~ "Excellent", 
        mental_value == "(2) Very good" ~ "Very Good",
        mental_value == "(3) Good" ~ "Good", 
        mental_value == "(4) Fair" ~ "Fair", 
        mental_value == "(5) Poor" ~ "Poor",
        mental_value == "(01) Excellent" ~ "Excellent", 
        mental_value == "(02) Very Good" ~ "Very Good",
        mental_value == "(03) Good" ~ "Good", 
        mental_value == "(04) Fair" ~ "Fair", 
        mental_value == "(05) Poor" ~ "Poor"),
      mental = factor(mental, levels = c("Poor", "Fair", "Good", "Very Good", "Excellent"))
    )
}

## Create a function to clean the marital status variable
clean_marital = function(x) {
  
  x %>% 
    mutate(
      marital = case_when(
        CS006 == "(1) Married" ~ "Married",
        CS006 == "(2) Living with a partner as if married" ~ "Living w/ Partner",
        CS006 == "(3) Single (never married)" ~ "Never married",
        CS006 == "(4) Separated" ~ "Separated",
        CS006 == "(5) Divorced" ~ "Divorced",
        CS006 == "(6) Widowed" ~ "Widowed"),
      marital = factor(marital, levels = c("Married", "Living w/ Partner", "Never married",
                                           "Separated", "Divorced", "Widowed"))
    )
  
}

## Create a function to clean the sex variable
clean_sex = function(x) {
  
   x %>% 
    mutate(
      sex = case_when(
        SEX == "(1) Male" ~ "Male",
        SEX == "(2) Female" ~ "Female")
    )
  
}

## Assign data from each wave to a new object where common variable names are given unique names
wave_1_data = da34315.0001 %>% 
  mutate(MHUCLA_LONELINESS_1 = MHUCLA_LONELINESS,
         PH001_1 = PH001,
         PH002_1 = PH002
         )
 
wave_2_data = da37105.0001 %>% 
  mutate(MHUCLA_LONELINESS_2 = MHUCLA_LONELINESS,
         PH001_2 = PH001,
         PH002_2 = PH002
         )
      
wave_3_data = da37106.0001 %>% 
  mutate(MHUCLA_LONELINESS_3 = MHUCLA_LONELINESS,
         PH001_3 = PH001,
         PH002_3 = PH002
         )

## Tidy data for self-report physical health, including sex, marital status, and ID
overall_physical = wave_1_data %>%  
  merge(wave_2_data, by = "ID") %>% 
  merge(wave_3_data, by = "ID") %>% 
  select(PH001_1, PH001_2, PH001_3, SEX.x, CS006, ID) %>% 
  gather(wave, physical_value, PH001_1:PH001_3) %>% 
  clean_wave()
  
## Tidy data for self-report mental health, including sex, marital status, and ID
overall_mental = wave_1_data %>%  
  merge(wave_2_data, by = "ID") %>% 
  merge(wave_3_data, by = "ID") %>% 
  select(PH002_1, PH002_2, PH002_3, SEX.x, CS006, ID) %>% 
  gather(wave, mental_value, PH002_1:PH002_3) %>% 
  clean_wave()

## Tidy data for loneliness, including sex, marital status, and ID
overall_loneliness = wave_1_data %>%  
  merge(wave_2_data, by = "ID") %>% 
  merge(wave_3_data, by = "ID") %>% 
  select(MHUCLA_LONELINESS_1, MHUCLA_LONELINESS_2, MHUCLA_LONELINESS_3, SEX.x, CS006, ID) %>% 
  gather(wave, loneliness_value, MHUCLA_LONELINESS_1:MHUCLA_LONELINESS_3) %>% 
  clean_wave()

## Tidy data for ICD codes,including sex, marital status, and ID 
overall_icd = wave_1_data %>%  
  merge(wave_3_data, by = "ID") %>% 
  select(ICD10_01:ICD10_16, ID, SEX.x, CS006) %>% 
  mutate(SEX = SEX.x) %>% 
  clean_sex() %>% 
  clean_marital() %>% 
  select(-SEX.x, -SEX, -CS006)
  
## Combine data for self-report physical and mental health, apply functions to create consistent, clean labels for variables
phys_ment_lone = overall_physical %>% 
  merge(overall_mental, by = c("ID", "wave")) %>% 
  merge(overall_loneliness, by = c("ID", "wave")) %>% 
  select(ID, wave, SEX.x, CS006.x, physical_value, mental_value, loneliness_value) %>% 
  clean_physical() %>% 
  clean_mental() %>% 
  mutate(
    CS006 = CS006.x,
    SEX = SEX.x
  ) %>% 
  clean_marital() %>% 
  clean_sex() %>% 
  select(-physical_value, -mental_value, -SEX, -CS006, -SEX.x, -CS006.x)

## Prepare data for loneliness plot
lone_data = 
  wave_1_data %>% 
  merge(wave_2_data, by = "ID") %>% 
  merge(wave_3_data, by = "ID") %>% 
  select(SEX.x, CS006, ID, MHUCLA_LONELINESS_1, MHUCLA_LONELINESS_2, MHUCLA_LONELINESS_3)  %>% 
  mutate(
    SEX = SEX.x
  ) %>% 
  clean_sex() %>% 
  clean_marital() %>% 
  select(-SEX.x, -CS006, -SEX)

Data source:
The data was obtained from The Irish Longitudinal Study on Ageing (TILDA) Waves 1, 2, and 3. Here is a link to the dataset https://www.icpsr.umich.edu/icpsrweb/ICPSR/series/726. To access information (e.g. codebooks) on this website, you will need to create a free account. We did not delete any participants from our dataset to ensure completeness of our code, in case a participant who is null right now responds in a future wave. The sample size for each wave (1, 2, 3) was n =8504, n = 7207, and n = 6400, respectively. Not all participants who were in Wave 1 were a part of Waves 2 and 3 (i.e. they were lost to follow up).

Exploratory analysis

The code below shows how we created visualizations in a Shiny dashboard. We used shiny to stratify the plots by various combinations of sex, marital status, and time. We provide select examples for each type of plot below. If you would like to view the full results of our exploratory visualizations and analyses, please refer to the Shiny Data Visualizations and Shiny Data Analysis tabs on our website.

  • Shiny Sidebar:
sex_choice = phys_ment_lone %>% distinct(sex) %>% pull()
marital_choice = phys_ment_lone %>% distinct(marital) %>% pull()
wave_choice = phys_ment_lone %>% distinct(wave) %>% pull()
radioButtons("wave_choice", label = h3("Choose wave"),
    choices = wave_choice,
    selected = 1)
radioButtons("sex_choice", label = h3("Choose sex"),
    choices = sex_choice, 
    selected = "Male")
radioButtons("marital_choice", label = h3("Choose marital status"),
    choices = marital_choice, 
    selected = "Married")
  • Bar Graph – Proportion of Incident Disease Burden in Wave 3 Attributable to Specified Disease:

In this plot we calculated the sum of people who had each ICD-10 disease in wave 3 in order to calculate the proportion of people with each disease, allowing us to visualize the burden of each disease in this population. We used our functions to clean the values for variables of interest, and we then created a bar graph to show the proportions and ordered the bars according to decreasing burden. We focused only on Wave 3 for this graph because we were interested in only visualizing the incident diseases in the most recent data collection phase. We also incorporated a color scale to depict the different burdens (with red corresponding to the most burdensome disease category).

renderPlotly({
icd_data = overall_icd %>% 
  filter(sex == input$sex_choice & marital == input$marital_choice) %>% 
  summarize(total_01 = sum(ICD10_01), total_02 = sum(ICD10_02), total_03 = sum(ICD10_03), 
            total_04 = sum(ICD10_04), total_05 = sum(ICD10_05), total_06 = sum(ICD10_06), 
            total_07 = sum(ICD10_07), total_08 = sum(ICD10_08), total_09 = sum(ICD10_09),
            total_10 = sum(ICD10_10), total_11 = sum(ICD10_11), total_12 = sum(ICD10_12),
            total_13 = sum(ICD10_13), total_14 = sum(ICD10_14), total_15 = sum(ICD10_15),
            total_16 = sum(ICD10_16)) %>% 
  gather(key = icd_code, value = total, total_01:total_16) %>% 
  separate(icd_code, into = c("total_char","icd_code"), sep = "_") %>% 
  select(-total_char) %>% 
  mutate(burden = total/sum(total)*100) %>% 
  mutate(disease = case_when(
    icd_code == "01" ~ "Infectious diseases",
    icd_code == "02" ~ "Neoplasms",
    icd_code == "03" ~ "Blood diseases",
    icd_code == "04" ~ "Nutritional/metabolic diseases",
    icd_code == "05" ~ "Mental/behavioral disorders",
    icd_code == "06" ~ "Nervous system diseases",
    icd_code == "07" ~ "Eye diseases",
    icd_code == "08" ~ "Ear diseases",
    icd_code == "09" ~ "Circulatory system diseases",
    icd_code == "10" ~ "Respiratory system diseases",
    icd_code == "11" ~ "Digestive system diseases",
    icd_code == "12" ~ "Skin diseases",
    icd_code == "13" ~ "Musculoskeletal system diseases",
    icd_code == "14" ~ "Genitourinary system diseases",
    icd_code == "15" ~ "Perinatal conditions",
    icd_code == "16" ~ "Congenital malformations"
  ))
icd_plot = 
  icd_data %>% 
  mutate(disease = reorder(disease, -burden)) %>% 
  ggplot(aes(x = disease, y = burden, fill = disease)) +
  geom_bar(stat = "identity") + 
  labs(
    x = "Disease",
    y = "Burden of Disease (%)"
  ) + 
  theme_bw() +
  theme(axis.text.x = element_text(angle = 65, size = 7)) + 
  theme(legend.position = "none") 

ggplotly(icd_plot, tooltip = c("x", "y"))
})
  • Example - Married males:
icd_data = overall_icd %>% 
  filter(sex == 'Male' & marital == 'Married') %>% 
  summarize(total_01 = sum(ICD10_01), total_02 = sum(ICD10_02), total_03 = sum(ICD10_03), 
            total_04 = sum(ICD10_04), total_05 = sum(ICD10_05), total_06 = sum(ICD10_06), 
            total_07 = sum(ICD10_07), total_08 = sum(ICD10_08), total_09 = sum(ICD10_09),
            total_10 = sum(ICD10_10), total_11 = sum(ICD10_11), total_12 = sum(ICD10_12),
            total_13 = sum(ICD10_13), total_14 = sum(ICD10_14), total_15 = sum(ICD10_15),
            total_16 = sum(ICD10_16)) %>% 
  gather(key = icd_code, value = total, total_01:total_16) %>% 
  separate(icd_code, into = c("total_char","icd_code"), sep = "_") %>% 
  select(-total_char) %>% 
  mutate(burden = total/sum(total)*100) %>% 
  mutate(disease = case_when(
    icd_code == "01" ~ "Infectious diseases",
    icd_code == "02" ~ "Neoplasms",
    icd_code == "03" ~ "Blood diseases",
    icd_code == "04" ~ "Nutritional/metabolic diseases",
    icd_code == "05" ~ "Mental/behavioral disorders",
    icd_code == "06" ~ "Nervous system diseases",
    icd_code == "07" ~ "Eye diseases",
    icd_code == "08" ~ "Ear diseases",
    icd_code == "09" ~ "Circulatory system diseases",
    icd_code == "10" ~ "Respiratory system diseases",
    icd_code == "11" ~ "Digestive system diseases",
    icd_code == "12" ~ "Skin diseases",
    icd_code == "13" ~ "Musculoskeletal system diseases",
    icd_code == "14" ~ "Genitourinary system diseases",
    icd_code == "15" ~ "Perinatal conditions",
    icd_code == "16" ~ "Congenital malformations"
  ))
icd_plot = 
  icd_data %>% 
  mutate(disease = reorder(disease, -burden)) %>% 
  ggplot(aes(x = disease, y = burden, fill = disease)) +
  geom_bar(stat = "identity") + 
  labs(
    x = "Disease",
    y = "Burden of Disease (%)"
  ) + 
  theme_bw() +
  theme(axis.text.x = element_text(angle = 65, size = 7)) + 
  theme(legend.position = "none") 

ggplotly(icd_plot, tooltip = c("x", "y"))
  • Spaghetti Plot - Trends in Loneliness Scores by Baseline Loneliness Scores:

We started by renaming the loneliness variables from each wave so that we could identify which wave they correspond to after merging the three waves together by ID. Again, we used our functions to clean the values under the variables of interest. We then created a spaghetti plot to show the change in loneliness across waves. A large change we made once we saw our original chaotic plot was that we decided to depict the different trends across waves for each baseline loneliness value (scale from 1-10). Although there is no one at loneliness 0 we kept this in our code for future replication of this code in future studies, if a participant were to respond as 0 at baseline. We created separate datasets for each loneliness value at baseline using a for loop. The resulting spaghetti plot shows the trend across waves by baseline loneliness values. We also incorporated a color grid to depict the different trends (with red corresponding to the highest, worst value of 10).

renderPlotly({
for (i in 0:10) {
  
  data = lone_data %>% 
  filter(MHUCLA_LONELINESS_1 == i & sex == input$sex_choice & marital == input$marital_choice) %>% 
  select(ID, MHUCLA_LONELINESS_1, MHUCLA_LONELINESS_2, MHUCLA_LONELINESS_3) %>% 
  gather(key = wave, value = loneliness_value, MHUCLA_LONELINESS_1:MHUCLA_LONELINESS_3) %>% 
  clean_wave() %>% 
  filter(loneliness_value != "NA")
  
  name = paste("baseline_loneliness_", i, sep = "")
  assign(name, data )
  
}
  
lone_data = baseline_loneliness_0 %>% 
  ggplot(aes(x = wave, y = loneliness_value)) +
  geom_smooth(se = FALSE, color = "#47FF00") +
  geom_smooth(data = baseline_loneliness_1, aes(x = wave, y = loneliness_value), se = FALSE, color = "#6AFF00") +
  geom_smooth(data = baseline_loneliness_2, aes(x = wave, y = loneliness_value, color = loneliness_value), se = FALSE, color = "#8DFF00") +
  geom_smooth(data = baseline_loneliness_3, aes(x = wave, y = loneliness_value, color = loneliness_value), se = FALSE, color = "#B0FF00") +
  geom_smooth(data = baseline_loneliness_4, aes(x = wave, y = loneliness_value), se = FALSE, color = "#D4FF00") +
  geom_smooth(data = baseline_loneliness_5, aes(x = wave, y = loneliness_value), se = FALSE, color = "#F7FF00") +
  geom_smooth(data = baseline_loneliness_6, aes(x = wave, y = loneliness_value), se = FALSE, color = "#FFE400") +
  geom_smooth(data = baseline_loneliness_7, aes(x = wave, y = loneliness_value), se = FALSE, color = "#FFC100") +
  geom_smooth(data = baseline_loneliness_8, aes(x = wave, y = loneliness_value), se = FALSE, color = "#FF9E00") +
  geom_smooth(data = baseline_loneliness_9, aes(x = wave, y = loneliness_value), se = FALSE, color = "#FF7B00") +
  geom_smooth(data = baseline_loneliness_10, aes(x = wave, y = loneliness_value), se = FALSE, color = "#FF5700") +
  scale_y_continuous(breaks = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)) +
  scale_x_continuous(breaks = c(1, 2, 3)) +
  theme_bw() +
  labs(
    x = "Wave",
    y = "UCLA Loneliness Score"
  ) 
  
ggplotly(lone_data)
})
  • Example - Married males:
for (i in 0:10) {
  
  data = lone_data %>% 
  filter(MHUCLA_LONELINESS_1 == i & sex == 'Male' & marital == 'Married') %>% 
  select(ID, MHUCLA_LONELINESS_1, MHUCLA_LONELINESS_2, MHUCLA_LONELINESS_3) %>% 
  gather(key = wave, value = loneliness_value, MHUCLA_LONELINESS_1:MHUCLA_LONELINESS_3) %>% 
  clean_wave() %>% 
  filter(loneliness_value != "NA")
  
  name = paste("baseline_loneliness_", i, sep = "")
  assign(name, data )
  
}
  
lone_data = baseline_loneliness_0 %>% 
  ggplot(aes(x = wave, y = loneliness_value)) +
  geom_smooth(se = FALSE, color = "#47FF00") +
  geom_smooth(data = baseline_loneliness_1, aes(x = wave, y = loneliness_value), se = FALSE, color = "#6AFF00") +
  geom_smooth(data = baseline_loneliness_2, aes(x = wave, y = loneliness_value, color = loneliness_value), se = FALSE, color = "#8DFF00") +
  geom_smooth(data = baseline_loneliness_3, aes(x = wave, y = loneliness_value, color = loneliness_value), se = FALSE, color = "#B0FF00") +
  geom_smooth(data = baseline_loneliness_4, aes(x = wave, y = loneliness_value), se = FALSE, color = "#D4FF00") +
  geom_smooth(data = baseline_loneliness_5, aes(x = wave, y = loneliness_value), se = FALSE, color = "#F7FF00") +
  geom_smooth(data = baseline_loneliness_6, aes(x = wave, y = loneliness_value), se = FALSE, color = "#FFE400") +
  geom_smooth(data = baseline_loneliness_7, aes(x = wave, y = loneliness_value), se = FALSE, color = "#FFC100") +
  geom_smooth(data = baseline_loneliness_8, aes(x = wave, y = loneliness_value), se = FALSE, color = "#FF9E00") +
  geom_smooth(data = baseline_loneliness_9, aes(x = wave, y = loneliness_value), se = FALSE, color = "#FF7B00") +
  geom_smooth(data = baseline_loneliness_10, aes(x = wave, y = loneliness_value), se = FALSE, color = "#FF5700") +
  scale_y_continuous(breaks = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)) +
  scale_x_continuous(breaks = c(1, 2, 3)) +
  theme_bw() +
  labs(
    x = "Wave",
    y = "UCLA Loneliness Score"
  ) 
  
ggplotly(lone_data)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
  • Count Plot - Self-Rated Physical vs. Mental Health Status:

We created a count plot for self-rated physical vs. mental health status. In this graph, points with larger circles represent a larger number of responses. We can see that many individuals self-reported having the same status for both physical and mental health.

renderPlotly({
  
  phys_ment_plot = phys_ment_lone %>%
  filter(sex == input$sex_choice & marital == input$marital_choice & wave == input$wave_choice ) %>% 
  ggplot(aes(x = physical, y = mental, color = ..n..)) +
  geom_count(alpha = 0.8) +
  labs(
    x = "Self-Rated Physical Health",
    y = "Self-Rated Mental Health"
  ) +
  theme_bw() +
  theme(legend.position = "none",
        axis.text.x = element_text(angle = 65, hjust = 1))
ggplotly(phys_ment_plot)
})
  • Example - Married males, Wave 3:
  phys_ment_plot = phys_ment_lone %>%
  filter(sex == 'Male' & marital == 'Married' & wave == 3) %>% 
  ggplot(aes(x = physical, y = mental, color = ..n..)) +
  geom_count(alpha = 0.8) +
  labs(
    x = "Self-Rated Physical Health",
    y = "Self-Rated Mental Health"
  ) +
  theme_bw() +
  theme(legend.position = "none",
        axis.text.x = element_text(angle = 65, hjust = 1))
ggplotly(phys_ment_plot)
  • Preliminary chi-Squared test of independence:

The code below calcualtes the observed and expected cell counts for the tabulation of self-rated mental health and self-rated physical health for all individuals in Wave 3. We chose to focus on Wave 3 becuase we were interested in only the most recent phase of data collection. After examining the observed and expected counts, we also calculated the chi-squared test statistic, degrees of freedom, and corresponding p-value.

phys_ment_lone =
  phys_ment_lone %>%
  filter(wave == 3)
 
chi = chisq.test(table(phys_ment_lone$mental, phys_ment_lone$physical))
 
observed = chi$observed %>%
  knitr::kable()
 
expected = chi$expected %>%
  knitr::kable()
 
results = chi %>%
  broom::tidy() %>%
  janitor::clean_names() %>%
  mutate(df = parameter,
         chi_stat = statistic) %>%
  select(chi_stat, df, p_value) %>%
  knitr::kable()

Below are the observed cell counts.

Poor Fair Good Very Good Excellent
Poor 37 17 10 5 3
Fair 65 249 137 51 6
Good 51 372 1268 403 60
Very Good 35 191 519 1315 185
Excellent 15 69 168 324 631

Below are the expected cell counts.

Poor Fair Good Very Good Excellent
Poor 2.362755 10.45199 24.46557 24.41901 10.30068
Fair 16.670546 73.74458 172.61817 172.28969 72.67701
Good 70.685742 312.68865 731.92822 730.53540 308.16198
Very Good 73.672001 325.89880 762.84998 761.39832 321.18089
Excellent 39.608956 175.21597 410.13805 409.35758 172.67944

Below are the results of the chi-squared test of independence.

chi_stat df p_value
4078.847 16 0

Additional Anaylses

  • Additional chi-squared tests of independence (stratified):

Based on the results from the preliminary analysis, we decided to run additional tests within specific subsets of our population using Shiny.

## Observed Cell Counts

renderTable({
  
phys_ment_lone =
  phys_ment_lone %>% 
  filter(sex == input$sex_choice & marital == input$marital_choice & wave == input$wave_choice)
  
chi = chisq.test(table(phys_ment_lone$mental, phys_ment_lone$physical))

observed = chi$observed

as.data.frame.matrix(observed) %>% 
  mutate("Health Status" = c("Poor", "Fair", "Good", "Very Good", "Excellent")) %>% 
  select("Health Status", "Poor", "Fair", "Good", "Very Good", "Excellent")
  
})


## Expected Cell Counts


renderTable({
  
  phys_ment_lone =
  phys_ment_lone %>% 
  filter(sex == input$sex_choice & marital == input$marital_choice & wave == input$wave_choice)
  
chi = chisq.test(table(phys_ment_lone$mental, phys_ment_lone$physical))

expected = chi$expected

as.data.frame.matrix(expected) %>% 
  mutate("Health Status" = c("Poor", "Fair", "Good", "Very Good", "Excellent")) %>% 
  select("Health Status", "Poor", "Fair", "Good", "Very Good", "Excellent")
  
})

## Chi-Squared Test of Independence

renderTable({
  
  phys_ment_lone =
  phys_ment_lone %>% 
  filter(sex == input$sex_choice & marital == input$marital_choice & wave == input$wave_choice)
  
chi = chisq.test(table(phys_ment_lone$mental, phys_ment_lone$physical))

results = chi %>% 
  broom::tidy() %>% 
  janitor::clean_names() %>% 
  mutate(df = format(round((parameter), digits = 0)),
         Chi2 = statistic,
         p = p_value) %>% 
  select(Chi2, df, p) 

as.data.frame.matrix(results) 

  
})

Discussion:

  • Graph 1:
    Our hypothesis for this plot was more broad in that we expected the largest burden of disease to come from chronic diseases, simply because the cohort is older and in Ireland, which may not have a high prevalence of infectious, or perinatal diseases and conditions. The bar graph specified that the 4 diseases with the highest proportion were circulatory system diseases, eye diseases, nutritional/metabolic diseases, and musculoskeletal system diseases. As predicted there were no incident cases of infectious diseases or perinatal conditions. Additionally, there were no cases of congenital malformations, ear diseases, or skin conditions. Interestingly, across sex, we noticed that the most prevalent disease among men was circulatory system diseases but among women it was nutritional/metabolic diseases.

  • Graph 2:
    Based on research suggesting that women are better equipped to rebuild social networks after the loss of a partner, we expected that loneliness scores would increase over time for widowed men more than widowed women. However, according to graph 2, this was not the case. Most widowed men tended to exhibit a decrease in loneliness scores compared to baseline (indicative of declines in feelings of loneliness), whereas widowed women stayed at relatively the same loneliness level across all waves with the exception of an an outlier at level 8 loneliness.

  • Graph 3:
    We anticipated that physical health and mental health would be correlated and from graph 3 we discovered that this was true. Those who felt they had better physical health also had better mental health and vice versa. We also found a larger concentration of older adults reporting Very Good or Good physical and mental health. This trend remained consistent across sex and over time. While this means that there was minimal improvement in physical and mental health over time, it also means there was a minimal decline in physical and mental health over time.

  • Preliminary chi-Squared test of independence:
    Since our graph depicted a correlation between self-rated mental health and physical health we anticipated an association. Our null hypothesis was that there is no association between self-rated mental health and physical health and the alternate hypothesis was that there is an association between self-rated mental health and physical health. We set alpha to 0.05 and calculated the chi-squared test statistic (4078.8) and p-value (0), which was significant. At the 5% level of significance, there is enough evidence to reject the null hypothesis and conclude that there is an association between mental health and physical health, which matches graph 1.

  • Additional chi-squared tests of independence:
    Based on these results, we used Shiny to further stratify the analysis to explore the association between self-rated mental health and physical health within specific subsets of our study population. At the 5% level of significance, among married females in wave 1, there is enough evidence to reject the null hypothesis and conclude that there is an association between mental health and physical health.